The impact of wrong-path memory references in cache-coherent multiprocessor systems
نویسندگان
چکیده
The core of current-generation high-performance multiprocessor systems is out-of-order execution processors with aggressive branch prediction. Despite their relatively high branch prediction accuracy, these processors still execute many memory instructions down mispredicted paths. Previous work that focused on uniprocessors showed that these wrong-path (WP) memory references may pollute the caches and increase the amount of cache and memory traffic. On the positive side, however, they may prefetch data into the caches for memory references on the correct-path. While computer architects have thoroughly studied the impact of WP effects in uniprocessor systems, there is no comparable work for multiprocessor systems. In this paper, we explore the effects of WP memory references on the memory system behavior of shared-memory multiprocessor (SMP) systems for both broadcast and directory-based cache coherence. Our results show that these WP memory references can increase the amount of cache-to-cache transfers by 32%, invalidations by 8% and 20% for broadcast and directory-based SMPs, respectively, and the number of writebacks by up to 67% for both systems. In addition to the extra coherence traffic, WP memory references also increase the number of cache line state transitions by 21% and 32% for broadcast and directory-based SMPs, respectively. In order to reduce the performance impact of these WP memory references, we introduce two simple mechanisms—filtering WP blocks that are not likely-to-be-used and WP aware cache replacement—that yield speedups of up to 37%. © 2007 Elsevier Inc. All rights reserved.
منابع مشابه
Investigating Effects of Wrong-path Memory References in Shared-memory Multiprocessors by Ayse Yilmazer a Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Electrical and Computer Engineering University of Rhode Island
High-performance multiprocessor systems are built around out-oforder processors with aggressive branch predictors. Despite their relatively high branch prediction accuracies, these processors execute many memory instructions on mispredicted paths. Previous studies that focused on uniprocessors systems showed that these wrong-path memory references may pollute the caches by bringing in data that...
متن کاملInvestigating the Effects of Wrong-Path Memory References in Shared-Memory Multiprocessor Systems
Uniprocessor studies have shown that wrong-path memory references pollute the caches by bringing in data that are not needed for the correct execution path and by evicting useful data or instructions. Additionally, they also increase the amount of cache and memory traffic. On the positive side, however, they may have a prefetching effect for loads and instructions on the correct path. While the...
متن کاملQuantifying and Comparing the Impact of Wrong-Path Memory References in Multiple-CMP Systems
Out-of-order execution processors with aggressive branch prediction are the core of current-generation high-performance multiprocessor systems. Despite their relatively high branch prediction accuracies, these processors still execute many memory instructions on the mispredicted path. These wrong-path memory references pollute the caches and increase the amount of memory traffic, but may also p...
متن کاملLocking Protocols for Large - ScaleCache - Coherent Shared Memory
Signiicant performance advantages can be realized by implementing a database system on a cache-coherent shared memory multiprocessor. An eecient implementation of a lock manager is a prerequisite for eecient transaction processing in multiprocessor database systems. To this end, we examine two approaches to the implementation of locking in a cache-coherent shared memory multiprocessor database ...
متن کاملShared Memory Multiprocessor Architectures for Software IP Routers
In this paper, we propose new shared memory multiprocessor architectures and evaluate their performance for future Internet Protocol (IP) routers based on Symmetric Multi-Processor (SMP) and Cache Coherent Non-Uniform Memory Access (CC-NUMA) paradigms. We also propose a benchmark application suite, RouterBench, which consists of four categories of applications representing key functions on the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 67 شماره
صفحات -
تاریخ انتشار 2007